Towards Semantic Microaggregation of Categorical Data for Confidential Documents
نویسندگان
چکیده
In the data privacy context, specifically, in statistical disclosure control techniques, microaggregation is a well-known microdata protection method, ensuring the confidentiality of each individual. In this paper, we propose a new approach of microaggregation to deal with semantic sets of categorical data, like text documents. This method relies on the WordNet framework that provides complete semantic relationship taxonomy between words. Therefore, this extension aims ensure the confidentiality of text documents, but at the same time, it should preserve the general meaning. We apply some measures to evaluate the quality of the protection method relying on information loss. URL http://www.springerlink.com/content/f41402862155w6t4/ [16] Source URL: https://www.iiia.csic.es/en/node/54964 Links [1] https://www.iiia.csic.es/en/staff/daniel-abril [2] https://www.iiia.csic.es/en/staff/guillermo-navarro-arribas [3] https://www.iiia.csic.es/en/staff/vicen%C3%A7-torra [4] https://www.iiia.csic.es/en/bibliography?f[author]=2516 [5] https://www.iiia.csic.es/en/bibliography?f[keyword]=940 [6] https://www.iiia.csic.es/en/bibliography?f[keyword]=497 [7] https://www.iiia.csic.es/en/bibliography?f[keyword]=941 [8] https://www.iiia.csic.es/en/bibliography?f[keyword]=447 [9] https://www.iiia.csic.es/en/bibliography?f[keyword]=944 [10] https://www.iiia.csic.es/en/bibliography?f[keyword]=942 [11] https://www.iiia.csic.es/en/bibliography?f[keyword]=932 [12] https://www.iiia.csic.es/en/bibliography?f[keyword]=930 [13] https://www.iiia.csic.es/en/bibliography?f[keyword]=943 [14] https://www.iiia.csic.es/en/bibliography?f[keyword]=945 [15] https://www.iiia.csic.es/en/bibliography?f[keyword]=939 [16] http://www.springerlink.com/content/f41402862155w6t4/
منابع مشابه
Towards Semantic Microaggregation of Categorical Data for Confidential Documents
In the data privacy context, specifically, in statistical disclosure control techniques, microaggregation is a well-known microdata protection method, ensuring the confidentiality of each individual. In this paper, we propose a new approach of microaggregation to deal with semantic sets of categorical data, like text documents. This method relies on the WordNet framework that provides complete ...
متن کاملScientific papers on semantics and aggregation procedures for SDC of qualitative variables
Microaggregation is a masking procedure used for protecting confidential data prior to their public release. This technique, that relies on clustering and aggregation techniques, is solely used for numerical data. In this work we introduce a microaggregation procedure for categorical variables. We describe the new masking method and we analyse the results it obtains according to some indices fo...
متن کاملTowards a private vector space model for confidential documents
We introduce in this paper a method to anonymize document vector spaces. These vector spaces can be used to analyze confidential documents without disclosing private information. The method is inspired in microaggregation, a popular technique used in statistical disclosure control. URL http://doi.acm.org/10.1145/2480362.2480543 [9] Source URL: https://www.iiia.csic.es/en/node/54488 Links [1] ht...
متن کاملSemantic adaptive microaggregation of categorical microdata
In the context of Statistical Disclosure Control, microaggregation is a privacy preserving method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters of, at least, k elements, and replaces them by their prototype so that they become k-indistinguishable (anonymous). This data transformation produces a loss of information with regards to the original dataset wh...
متن کاملSpherical microaggregation: Anonymizing sparse vector spaces
Abstract Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by ...
متن کامل